Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt] Enable CFG optimization for local tensors #3237

Merged
merged 5 commits into from
Oct 21, 2021

Conversation

strongoier
Copy link
Contributor

@strongoier strongoier commented Oct 20, 2021

Related issue = #2590, #2637, #3218, #3228

Although #2637 introduced local tensors, related CFG optimization was not implemented, resulting in redundant local memory allocation/load/store in many cases. This PR enables CFG optimization for local tensors, and eliminates the overhead of #3218 and #3228. Let's look at a tiny example:

import taichi as ti

ti.init(dynamic_index=True, print_preprocessed=True, print_ir=True)

@ti.kernel
def my_func():
    a = ti.Vector([1, 2])
    print(a.sum())

my_func()

Before this PR, the final IR for the kernel is:

kernel {
  $0 = offloaded  
  body {
    <[Tensor (2) i32]> $1 = alloca
    <i32> $2 = const [1]
    <i32> $3 = const [0]
    <*i32> $4 = shift ptr [$1 + $3]
    <i32> $5 : local store [$4 <- $2]
    <i32> $6 = const [2]
    <i32> $7 = const [4]
    <*i32> $8 = shift ptr [$1 + $7]
    <i32> $9 : local store [$8 <- $6]
    <i32> $10 = local load [ [$4[0]]]
    <i32> $11 = local load [ [$8[0]]]
    <i32> $12 = add $10 $11
    print $12, "\n"
  }
}

After this PR, the final IR for this kernel is:

kernel {
  $0 = offloaded  
  body {
    <i32> $1 = const [3]
    print $1, "\n"
  }
}

Details:

  1. Add a not-too-conservative alias analysis for PtrOffsetStmt, which is able to produce definitely same/different results.
  2. Stop treating a TensorType alloca as a store to enable store-to-load forwarding. This is because currently local tensors must be initialized with values (ti.Vector([1, 2, 3])). Then we don't treat TensorType alloca itself as a valid forwarding source.
  3. Stop exposing a PtrOffsetStmt with an alloca origin to other offloaded tasks (it shouldn't appear in final node live_in) to enable dead store elimination.

@netlify
Copy link

netlify bot commented Oct 20, 2021

✔️ Deploy Preview for jovial-fermat-aa59dc canceled.

🔨 Explore the source changes: 0bd8dd7

🔍 Inspect the deploy log: https://app.netlify.com/sites/jovial-fermat-aa59dc/deploys/61704154ccbbcc00087fc383

@strongoier
Copy link
Contributor Author

/format

@k-ye k-ye requested a review from xumingkuan October 21, 2021 05:55
Copy link
Contributor

@ailzhang ailzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! thanks!

Copy link
Member

@k-ye k-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@k-ye k-ye merged commit 53e04c6 into taichi-dev:master Oct 21, 2021
Copy link
Contributor

@xumingkuan xumingkuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
(Just curious, is it possible that there is a local tensor without an initial value?)

@strongoier
Copy link
Contributor Author

LGTM! (Just curious, is it possible that there is a local tensor without an initial value?)

Current it is impossible. But maybe we will support that in the future, and the CFG part will be updated accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants